首页> 外文OA文献 >Incremental Sampling-based Motion Planners Using Policy Iteration Methods
【2h】

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

机译:使用策略迭代的基于增量采样的运动规划器   方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recent progress in randomized motion planners has led to the development of anew class of sampling-based algorithms that provide asymptotic optimalityguarantees, notably the RRT* and the PRM* algorithms. Careful analysis revealsthat the so-called "rewiring" step in these algorithms can be interpreted as alocal policy iteration (PI) step (i.e., a local policy evaluation step followedby a local policy improvement step) so that asymptotically, as the number ofsamples tend to infinity, both algorithms converge to the optimal path almostsurely (with probability 1). Policy iteration, along with value iteration (VI)are common methods for solving dynamic programming (DP) problems. Based on thisobservation, recently, the RRT$^{\#}$ algorithm has been proposed, whichperforms, during each iteration, Bellman updates (aka "backups") on thosevertices of the graph that have the potential of being part of the optimal path(i.e., the "promising" vertices). The RRT$^{\#}$ algorithm thus utilizesdynamic programming ideas and implements them incrementally on randomlygenerated graphs to obtain high quality solutions. In this work, and based onthis key insight, we explore a different class of dynamic programmingalgorithms for solving shortest-path problems on random graphs generated byiterative sampling methods. These class of algorithms utilize policy iterationinstead of value iteration, and thus are better suited for massiveparallelization. Contrary to the RRT* algorithm, the policy improvement duringthe rewiring step is not performed only locally but rather on a set of verticesthat are classified as "promising" during the current iteration. This tends tospeed-up the whole process. The resulting algorithm, aptly named PolicyIteration-RRT$^{\#}$ (PI-RRT$^{\#}$) is the first of a new class of DP-inspiredalgorithms for randomized motion planning that utilize PI methods.
机译:随机运动计划者的最新进展导致了新一类基于采样的算法的发展,这些算法提供渐近最优性保证,尤其是RRT *和PRM *算法。仔细的分析表明,这些算法中的所谓“重新规划”步骤可以解释为本地策略迭代(PI)步骤(即本地策略评估步骤,然后是本地策略改进步骤),因此随着样本数量的增加,渐近地无限大时,两种算法几乎都可以收敛到最优路径(概率为1)。策略迭代以及值迭代(VI)是解决动态编程(DP)问题的常用方法。基于此观察,最近提出了RRT $ ^ {\#} $算法,该算法在每次迭代期间对图的那些可能成为最优路径一部分的顶点进行Bellman更新(aka“备份”)。 (即“有希望的”顶点)。因此,RRT $ ^ {\#} $算法利用了动态编程思想,并在随机生成的图上逐步实现了这些思想,以获得高质量的解决方案。在这项工作中,并基于此关键见解,我们探索了另一类动态规划算法,以解决由迭代采样方法生成的随机图上的最短路径问题。这类算法利用策略迭代而不是值迭代,因此更适合大规模并行化。与RRT *算法相反,重新布线步骤期间的策略改进不仅仅在本地执行,而是在当前迭代过程中分类为“有希望”的一组顶点上执行。这倾向于加快整个过程。生成的算法恰当地命名为PolicyIteration-RRT $ ^ {\#} $(PI-RRT $ ^ {\#} $),是利用PI方法进行随机运动规划的新型DP启发式算法中的第一个。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号